📁 Latest file found: retail_sugar_prices_2025-04-16.csv
| date | admin1 | admin2 | market | market_id | latitude | longitude | category | commodity | commodity_id | unit | priceflag | pricetype | currency | price | usdprice | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1994-01-15 | Gujarat | Ahmadabad | Ahmedabad | 923 | 23.03 | 72.62 | miscellaneous food | sugar | 97 | KG | actual | retail | INR | 13.5 | 0.43 |
| 1 | 1994-01-15 | Karnataka | Bangalore Urban | Bengaluru | 926 | 12.96 | 77.58 | miscellaneous food | sugar | 97 | KG | actual | retail | INR | 13.2 | 0.42 |
| 2 | 1994-01-15 | Maharashtra | Mumbai city | Mumbai | 955 | 18.98 | 72.83 | miscellaneous food | sugar | 97 | KG | actual | retail | INR | 13.8 | 0.44 |
| 3 | 1994-01-15 | Orissa | Khordha | Bhubaneshwar | 929 | 20.23 | 85.83 | miscellaneous food | sugar | 97 | KG | actual | retail | INR | 13.5 | 0.43 |
| 4 | 1994-01-15 | Tripura | West Tripura | Agartala | 921 | 23.84 | 91.28 | miscellaneous food | sugar | 97 | KG | actual | retail | INR | 16.0 | 0.51 |
📊 Insights from EDA of Retail Sugar Prices (1994–2025)¶
🧭 1. Overall Trends (Long-Term Behavior)¶
- Sugar prices in India have steadily increased over the last 30 years.
- From around ₹8/kg in 1994, prices have climbed to over ₹45/kg by recent years.
- A strong upward trend is clearly visible in the trend decomposition plot, especially during:
- 2009–2011: Noticeable inflationary spike in sugar prices.
- 2020–2022: Gradual increase likely influenced by pandemic-era disruptions.
📈 2. Monthly Average & Seasonality¶
- There is seasonal fluctuation in sugar prices, repeating every year.
- Prices tend to drop in early months (April–May) and rise again toward the year-end.
- This pattern suggests:
- A possible link to agricultural harvest cycles or festive demand variations.
- E.g., Diwali or end-of-year celebrations may create demand surges.
🧮 3. Monthly Percentage Change¶
- Most months show mild percentage changes (<5%), but a few months spike or drop sharply:
- These may correspond to policy changes, import/export controls, or supply shocks.
- Sudden changes often align with known economic events or weather-related impacts.
📦 4. Distribution Insights¶
- The most common sugar price historically was ₹8, which occurred 21 times — mostly in early years.
- Overall, sugar prices follow a right-skewed distribution with a concentration of values between ₹20–₹40/kg.
- This skew indicates gradual but consistent inflation in consumer sugar pricing.
📅 5. Year-wise Variability¶
- The boxplot shows wider price ranges in later years (post-2010), suggesting:
- Increased volatility in the market.
- Possibly driven by global market influences, fuel costs, or climate-driven variability.
🌡️ 6. Heatmap View¶
- The heatmap confirms a repeating seasonal cycle:
- Sugar prices are lower in mid-year months and higher towards the end/start of each year.
- 2009, 2016, and 2020 stand out with unusual spikes, possibly due to external shocks.
C:\Users\neeti\AppData\Local\Temp\ipykernel_13712\757177853.py:28: UserWarning: Glyph 128202 (\N{BAR CHART}) missing from current font.
plt.tight_layout()
C:\Users\neeti\anaconda3\Lib\site-packages\IPython\core\pylabtools.py:170: UserWarning: Glyph 128202 (\N{BAR CHART}) missing from current font.
fig.canvas.print_figure(bytes_io, **kw)
Top states by data completeness:
admin1 months_of_data
0 Maharashtra 257
1 Tamil Nadu 251
2 Rajasthan 241
3 Orissa 240
4 Himachal Pradesh 231
5 Uttar Pradesh 230
6 Karnataka 223
7 Madhya Pradesh 219
8 Kerala 217
9 Bihar 215
Bottom states by data completeness:
admin1 months_of_data
21 Uttarakhand 112
22 Andhra Pradesh 111
23 Andaman and Nicobar 93
24 Chandigarh 92
25 Nagaland 92
26 Puducherry 79
27 Goa 72
28 Chhattisgarh 34
29 Sikkim 21
30 Manipur 13
C:\Users\neeti\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1416: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning super()._check_params_vs_input(X, default_n_init=10) C:\Users\neeti\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1440: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=1. warnings.warn(
state cluster 15 Maharashtra 0 28 Uttar Pradesh 0 26 Telangana 0 25 Tamil Nadu 0 23 Rajasthan 0 20 Orissa 0 14 Madhya Pradesh 0 13 Kerala 0 12 Karnataka 0 11 Jharkhand 0 30 West Bengal 0 8 Gujarat 0 6 Delhi 0 2 Assam 0 3 Bihar 0 4 Chandigarh 0 10 Himachal Pradesh 0 0 Andaman and Nicobar 1 16 Manipur 1 24 Sikkim 1 1 Andhra Pradesh 2 22 Punjab 2 9 Haryana 2 29 Uttarakhand 2 7 Goa 2 21 Puducherry 2 5 Chhattisgarh 2 19 Nagaland 3 18 Mizoram 3 17 Meghalaya 3 27 Tripura 3
C:\Users\neeti\AppData\Local\Temp\ipykernel_13712\1823660468.py:49: UserWarning: Glyph 128205 (\N{ROUND PUSHPIN}) missing from current font.
plt.tight_layout()
C:\Users\neeti\anaconda3\Lib\site-packages\IPython\core\pylabtools.py:170: UserWarning: Glyph 128205 (\N{ROUND PUSHPIN}) missing from current font.
fig.canvas.print_figure(bytes_io, **kw)
🧠 Cluster Interpretations¶
✅ Cluster 0 – Majority Group (Stable/Aligned Trend)¶
States: Maharashtra, Uttar Pradesh, Tamil Nadu, Rajasthan, Karnataka, Kerala, Madhya Pradesh, Gujarat, etc.
- These states show similar seasonal patterns and consistent data availability.
- Likely follow national sugar price trends.
- Suitable for country-level modeling or selecting a representative sample group.
🌴 Cluster 1 – Sparse/Irregular or Island Territories¶
States: Andaman & Nicobar, Manipur, Sikkim
- Often have sparse data or irregular price patterns.
- May have unique supply chains (e.g., non-agricultural or import-dependent).
- Not ideal for standard trend modeling without adjustment.
⚡ Cluster 2 – Semi-distinct / Volatile Trends¶
States: Andhra Pradesh, Punjab, Haryana, Goa, Puducherry, Chhattisgarh
- Show greater price volatility or regional fluctuations.
- Could be influenced by local governance, infrastructure, or supply-demand issues.
- May need separate forecasting models or volatility handling.
🏔️ Cluster 3 – Northeast Focused Outliers¶
States: Nagaland, Mizoram, Meghalaya, Tripura
- Exhibit distinctive pricing behavior compared to mainland states.
- Influenced by transport costs, geography, or border trade policies.
- Important to treat as a unique regional group in analysis.
Requirement already satisfied: plotly in c:\users\neeti\anaconda3\lib\site-packages (5.22.0) Requirement already satisfied: geopandas in c:\users\neeti\anaconda3\lib\site-packages (1.0.1) Requirement already satisfied: tenacity>=6.2.0 in c:\users\neeti\anaconda3\lib\site-packages (from plotly) (8.2.2) Requirement already satisfied: packaging in c:\users\neeti\anaconda3\lib\site-packages (from plotly) (23.2) Requirement already satisfied: numpy>=1.22 in c:\users\neeti\anaconda3\lib\site-packages (from geopandas) (1.26.4) Requirement already satisfied: pyogrio>=0.7.2 in c:\users\neeti\anaconda3\lib\site-packages (from geopandas) (0.10.0) Requirement already satisfied: pandas>=1.4.0 in c:\users\neeti\anaconda3\lib\site-packages (from geopandas) (2.2.2) Requirement already satisfied: pyproj>=3.3.0 in c:\users\neeti\anaconda3\lib\site-packages (from geopandas) (3.7.1) Requirement already satisfied: shapely>=2.0.0 in c:\users\neeti\anaconda3\lib\site-packages (from geopandas) (2.1.0) Requirement already satisfied: python-dateutil>=2.8.2 in c:\users\neeti\anaconda3\lib\site-packages (from pandas>=1.4.0->geopandas) (2.9.0.post0) Requirement already satisfied: pytz>=2020.1 in c:\users\neeti\anaconda3\lib\site-packages (from pandas>=1.4.0->geopandas) (2024.1) Requirement already satisfied: tzdata>=2022.7 in c:\users\neeti\anaconda3\lib\site-packages (from pandas>=1.4.0->geopandas) (2023.3) Requirement already satisfied: certifi in c:\users\neeti\anaconda3\lib\site-packages (from pyogrio>=0.7.2->geopandas) (2024.7.4) Requirement already satisfied: six>=1.5 in c:\users\neeti\anaconda3\lib\site-packages (from python-dateutil>=2.8.2->pandas>=1.4.0->geopandas) (1.16.0) Note: you may need to restart the kernel to use updated packages.